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ABSTRACT 

Methodological deficiencies inherent in expert-novice 
reading research make it impossible to draw inferences about 
curriculum change. First, comparisons of intact groups are often used 
as a basis for making causal inferences about how observed 
characteristics affect behaviors. While comparing different groups is 
not by itself a useless activity, progressing directly to training is 
premature at best^ Second, the think-aloud protocol technique is 
often used for inferring a subject's cognitive structure of subject 
matter. This method is inappropriate because it assumes that the 
organization of this structure resides consciously in a person's mind 
and can be verbally reproduced. Third, retrospective methods have 
been employed to infer causality by selecting groups currently 
differing and discovering differences in their past on putative 
causal variables, which are then inferred to have caused the present 
differences. While this technique must be used in historical 
analyses, it becomes suspect when the inferences are used to 
speculate on implications for current practice. Finally, techniques 
employed in naturalistic inquiry often confuse a change in 
methodology with a change in the discipline being studied, and rely 
heavily on impressionistic, one-shot observation for many facts. 
(JD) 
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Methodological limitations of the application 
of expert systems methodology in reading 

The comparison between the expert and inexpert, whether in 
organizations or in humans, has become a major research technique 
in education in the last decade. In research on problem solving 
comparisons have been made between physicists and physics stu- 
dents. In reading research the basic comparison is between good 
and poor readers for much of the information-processing theory 
being advanced* In educational administration successful schools 
are compared with unsuccessful schools. Creative students are 
compared with normal students. In all of these areas a similar 
paradigm is being used: the expert system is compared on a number 
of attributes with a less or inexpert system. Some attributes are 
assumed to be causal (such as program differences or strategies 
employed) and others are assumed to be outcomes, such as achieve- 
ment or time to solution of a problem. Differences between the 
expert and inexpert on the causal variables are then assumed to 
be evidence for causation of the variables, and these salient 
variables are promoted as efficacious for remedying the deficien- 
cies of the inexpert. This paradigm is an important departure 
from the dominant experimental paradigm used in education, and 
this paper presents a critical examination of its methodological 
1 imitations. 

It is an interesting situation that the expertise 
methodology has sprung from two quite different areas of 
research, cognitive psychology and curriculum behaviorism. Cogni- 
tive psychology has drawn from and has itself influenced artifi- 



cial intelligence (AI) research in machine computing. As Chi, 
Feltovitch & Gldser (1982) has noted a shift took place in AI 
research from power strategies to knowledge strategies. The best 
knowledge strategies available for study were found in human 
beings, so that new computer programs were developed to imitate 
the way human experts organized and processed information, as 
could be determined in comparisons with inexpert humans. The 
effective schools movement, not directly influenced by artificial 
intelligence in any obvious way, sought to study the school 
environment. One outcome of such study, conducted by anthropolo- 
gists, psychologists-, and educational researchers, was that some 
schools (classrooms, teachers, administrators, etc.) were super- 
ior in performance to others. Comparisons between variables de- 
fined as input, or causal, and output or dependent, led to pre- 
scriptions for change In inexpert schools to make them more like 
the best schools being observed. For example Clark & McCarthy 
(iy83) reported on a cohort sequential type design in which 
volunteer New York City schools implemented a new program based 
on effective schools literature. 

Since most of the expert-novice comparisons were presented 
in the methodological trappings of experimental research (ANOVA 
statistical analysis and interpretation) there have been only one 
or two serious evaluations of the causal logic underlying the 
method and Its basis for making causal Inferences (Rowan, Bossert 
& Dwyer,1983). The main thesis presented here is that all re- 
search applying this method suffers from Internal validity flaws 
sufficiently serious to render it uninterpretable. Furthermore, 
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expertise method is incapable of supporting causal inference 
regarding change in the inexpert without true experimental re- 
search. 

Techniques employed 
The techniques being drawn upon in research on expertise 
include, but are not limited to, the following: comparisons 
between inact groups; think-aloud protocol; naturalistic inquiry, 
including ethnographic field method; and retrospective research. 
There are researchers who are employing experimental research as 
part of their research strategies, and their applications, 
specifically exempted from the criticism levelled here, will be 
fnentioned as exemplars of appropriate or adequate research^ 

Comparison between intact groups . This technique is widely used 
in expertise research. In problem-solving research Chase & Simon 
(1973) compared the ability of chess masters and novices to chunk 
board groupings. They found more elements in the chunks of mas- 
ters than in the novices^ Simon X Simon (1978) found differences 
in the problem-solving behaviors of physicists and physics stu- 
dents using verbal protocols of the tasks each performed when 
solving novel problems. The effective schools movement has used 
comparisons between schools defined as outstanding or excellent 
and those defined as inferior or deficient to make prograiMatic 
decisions about how schools ought to be run. A wel I -constructed 
criticism of the effective schools research methodology was made 
by Rowan, Bossert, & Owyer (1983). Their specific points will be 
incorporated into this review; these points include difficulty 



witn causal ordering, instrumentation, limitations of generaliza- 
tion, and nonequi valent control group comparisons. 

The effective schools research of the 197U*s was employed in 
examining reading at both the school and classroom leveU Teacher 
effectiveness has been particularly emphasized (Rupley, Wise, & 
Logan, 1986). Brophy*s (1973) work on process-product research 
with primary grade teachars is a widely cited example; later 
important studies include Medley (1977) and Rosenshine (1978); 
the latter study raised the problem of little experimental veri- 
fication for effectiveness research. The Stanford Program on 
Teaching Effectiveness (Crawford, Gage, Corno, Staybrook, Mitman, 
Schunk, Stall ings, Baskin, Hanvey, Austin & Newman, 1978), and 
the Firr^t Grade Reading Group Study (Anderson, Evertson & Brophy, 
1978) are experimental or quasi-experimental studies based on 
initial observations contrasting good and poor teachers (Rupley 
et al, 1986). It is important to note that in these studies 
curriculum recommendations were made after comparative interven- 
tion was made, not directly on the basis of the original compari- 
sons. 

Much of the recent research on reading from a cognitive 
perspective is based on comparisons between good and poor read- 
ers. For example, in a recent article by Underwood & Zola (1986) 
good and poor readers were compared on letter recognition span. 
In this study no differences were found and no particular 
instructional inferences were made. In other studies this has not 
been the case: McGee (198^) compared good and poor fifth grade 
readers and poor third grade readers, finding differences in 
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recall of text structure ordered from good to poor fifth graders 
to third graders. McGee concluded that young readers "benefit 
from following the top-level structure of text to guide reading 
and remembering passage information..." Even though a disclaimer 
below this quote suggests tha need for more research on efective- 
ness of instruction, there is a clear message that the observed 
differencti are caused by what good readers do, and that poor 
readers will be helped by some strategy based on the good read- 
ers* processes. While the study is itself limited because of the 
text reading level (third grade), it is part of a chain of re- 
search related to automaticity (Laberge & Samuels, 1974) which is 
itself based in part on these same good-poor reader differences. 
There is simply no basis for assuming that the poor readers can 
be made to perform like the good readers or that their processing 
will oecome automatic, or if automatic in the same way that the 
good readers' process is automatic. Another such study is due to 
Sannomiya (1984) in which poor third grade comprehenders were 
compared with good sixth grade comprehenders on text comprehen- 
sion under auditory or visual conditions. In this study both age 
and ability are confounded. Again, there is no evidence that the 
poor comprehenders can be made to look like good sixth graders, 
or that different modes of presentation will change reading 
performance in this direction. 

Intact groups are also used as the basis for inferring 
developmental change. For example, Baldwin & Coady (1978) com- 
pared fifth graders and college students on their use of punctua- 
tion as clues to meaning in isolated sentences. They found dif- 
ferences between the groups and inferred developmental differ- 



ences in use of punctuation as clues to meaning. It is common to 
see studies that mix age and reading ability. Juel (1983) com- 
pared grade two, grade five, and upper division undergraduates; 
good and poor readers were identified for the elementary groups. 
In this study the word adult was used interchangeably with the 
college sample, the implication being that these readers are a 
norm for adult performance. This assumption is most definitely 
wrong, and an assumption that the elementary students are likely 
to or can become like these adults is unwarranted. Juel never- 
theless suggests that presentintj children with practice words 
with similar letter combinations would help to develop versatil- 
ity in decoding. That may be true, but the comparisons made in 
her study do not support such conclusions. There are many other 
child-adult comparisons in recent literature in which the adults 
are high ability college students (McGee, 1982; Schwartz, 1980; 
Taylor, 1980), or secondary students (Fletcher, Satz, & Schnles, 
1981). 

The use of intact groups has been repeatedly criticized in 
the educational research methodology literature from Campbell & 
Stanley (1963) onward with respect to the inference of causality 
for observed characteristics affecting behaviors. In the case of 
good versus poor readers, the inference is that v^hat good readers 
do, poor readers can do, and that instruction directly oriented 
toward the discrepancy will remediate deficiencies in the poor 
readers. The good readers are the experts, and the poor readers 
the novices. The critical assumption is that the good readers 
were themselves in the poor readers* state at some point. Often, 



7 

• 8 



since the two groups are age matched, this is not true. The good 
readers were never like the poor readers. Consequently, the 
inference that the observed differences in condition can lead to 
training is by itself without basis. Similarly, developmental 
studies are susceptible to the same difficulty, especially when 
they involve elementary, secondary, and college populations. In 
the Baldwin & Coady (1978) study a comparison between fifth grade 
and college students is meaningless, because differences -^ay be 
due to selection: if one were able to select the fifth graders 
who will eventually rjo to college, would we still bee the dif- 
ferences in use of punctuation clues? Even if we did find the 
differences, how cn-nfortable would we be in ignoring any other 
differences that remain between the prospective college-bound 
fifth graders and the college students. Any variables upon which 
tne two groups differ become possible alternative causal vari- 
ables, and training in the absense of experimental demonstration 
is merely guesswork. A similar problem exists for comparisons 
with secondary students when dropout rate becomes appreciable 
(after grade 10), or when students begin self-selecting into 
courses (grade 9). Differential maturation and history are other 
threats from the Campbell & Stanley list which are relevant. 
Finally, regression threats due to selection of extremes are not 
only omnipresent in good-poor comparisons, their effects should 
always be estimated statistically just to provide a comparison 
with the observed differences. 

Comparing different groups is not by itself a useless 
activity, but progressing directly to training is premature at 
best. Differences between good and poor readers, or between 



developmentaVly different groups of readers, is useful for 
supplying clues or hints for more careful investigation. The 
tendency to assume a causal shortcut, permitting the ignoring of 
experimentation, is unfortunate; while the technique may prove 
correct in a few instances, our experience in educational 
research with intact groups is lengthy enough to predict many 
erroneous conclusions and wasted resources if the method is 
allowed to predominate. 

Similarly, research on developmental differences has largely 
opted for cross-sectional designs, not wishing to do the hard 
research implicit in true longitudinal study of development. In 
the good reader-poor reader research this is particularly 
telling, for we have little data on long term development of 
either group from a cognitive, information processing theoretic 
perspective, "^his is the causal ordering problem that Rowan et al 
(1983) pointed out; cross-sectional designs that substitute for 
longitudinal designs almost always have this difficulty. 

Think-aloud protocol ^ This technique has been used in the study 
of expert and novice organization of knowledge and was eloquently 
and favorably defended by Ericsson & Simon (1980) as a valid 
means to record information that humans are attending to in 
short-term memory. It was attacked by Phillips (1983) as an 
inappropriate technique to infer human's cognitive structure of 
subject matter. The core of Phillips' argument is that the 
external organization imposed in the learning required for a task 
may require a person to reproduce it verbally, but there is no 
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evidence that that organization resides internally in the per- 
son's mindo Similarly, the content and organization of a ques- 
tion, with perhaps the exception of free response, Imposes an 
organization on the subject's response that does not necessarily 
mirror the internal representation of the response. The use of 
think aloud method, while it occurs in a variety of research 
contexts, is a major technique in naturalistic or ethnographic 
research. A recent study by Nicholson (1984) in which 3600 
rninuxes of inerviewing with junior high students was conducted is 
an example in point. This study will be examined in more detail 
below, but interview techniques in the comparison of experts and 
novices are likely to suffer from many difficulties. In reading 
it is particularly problematical because the researchers usually 
share the same culture (reading, education, etc.) as both the 
experts and the novices. This is usually a drawback for ethnogra- 
phers, who are attempting to view the culture with fresh eyes, in 
Nicholson's work the experts were teachers, and the similarity 
between researcher and expert was far greater than between re- 
searcher and novices (teenagers). The commonness of a shared 
language of educationese is quite troublesome for a researher in 
such conditions and the trustworthiness of such interviewing must 
be questioned; it is not that interviewing cannot be done well, 
it is that great care must be taken to support the evidence 
presented in such a context. 



Retrospective studies . In research on creativity the compari- 
son between creative and noncreative individuals has led to the 
formulation of programs to teach creativity (Van Tassel -Baska, 
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1^86), Also, researchers on creativity have employed retrospec- 
tive methods to examine prior differences between inore and less 
creative individuals and then to propose changes in education 
which are expected to engender the same effects in young students 
as were observed in the creative adults. Segal, Busse & Mansfield 
(1981) compared retrospectively two groups of biologlstSj highly 
cited and nonhighly cited, using self-report survey technique. 
They found post-doctoral prouuctivity to be related to pre- 
doctoral productivity and high school science interest. 

As noted by example aove this research technique is used to 
infer causality by selecting groups currently differing and dis- 
covering differences in their past on putative causal variables, 
which are then inferred to have caused the present differences. 
This technique is apparently not used very rauc.h in reading re- 
search, for a search over the last ten years found only one 
study, by Castagna (1982) in which a historical examination of 
influential persons in western history was made using biographies 
and autobiographies. The implication is that decisive reading 
Changed these people and that some was purposeful, some was not. 
Of course, historical analyses must use such methods; it is only 
if an implication for current practice is made that the analysis 
becomss suspect. 

Naturalistic inqui ry . This body of techniques, attempting to 
become a method in educational research ^in Kaplan' s (1964) 
sense, draws upon ethnographic research from cultural 
anthropology, but then leaves it in a philosophical sense. Recent 
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apologies by Harste (undated) and Weaver (198b) liken the use of 
naturalisatic inquiry to a paradigm shift, citing Kuhn's now 
dated and largely refuted work (1963), While this debate more 
properly belongs in a difi'erent critical paper, the use of the 
techniques in the expert-novice studies requires a small aside. 
The appeal to a paradigm shift has been misunderstood and mislaid 
to boot. The shift occurred in psychology in the late 196U*s and 
is often tied to Neisser's (1967) resurrection of internal 
mental representational constructs, the shift being away from 
behaviorism. This paradigm shift has flowed into educational 
research rapidly and convincingly, predating the widespread 
interest in ethnographic techniques by a decade. The latter 
interest, it is presumed, was an outgrowth of the real paradigm 
shift. Paradigm shifts occur in disciplines w'len the prevailing 
theories are overturned by new, revolutionary ones, that 
nevertheless account for the facts and relationships previously 
learned. In paradigm shifts the old is not discarded, it is 
reinterpreted. There is no such change occurring in reading, 
notwithstanding the wishful thinking of Harste (undated). The 
mistake is in confusing a change in methodology with a change in 
the discipl ine. Methodologies cannot and never will drive 
disciplines to the extent that the naturalistic inqui rers 
maintain that they do; recent arguments by Kuhn (1976) himself 
have backpedalled on the theory-1 adenness argument of data. Cooke 
& Campbell (1979) attack the emphasis by philosophers of science 
on the preeminence of theory, relegating. facts to an unwarranted 
secondary status. That is, facts are observed by resrearchers 
working from different methodological perspectives. They must 
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reconcile them; their methodologies become more suspect than the 
facts, which are interobserver confirmable. If the facts are not 
confirmable, then they cannot be admitted. This latter issue 
becomes the main problem for the naturalistic researchers, for 
they rely heavily on impressionistic, one-shot observation for 
many facts. Many researchers using this method deny 
intersubjective confirmability, but they abandon science for art. 
They are not wrong, they merely inquire in another domain. 

A number of natural istic studies in reading have been 
published in the last few years. The study by Nicholson (1984) 
is the primary study I have encountered which purported to 
compare experts and novices. The study actually examined the 
structures of teenagers* understanding of' classroom material; 
teachers were apparently ignored, although there is an appeal to 
teachers as experts at athe end of the study. A small section on 
low achievers was also tacked on. The catchy title was misleading 
or there was a serious editing problem because there was no 
comparaison between experts and novices in this study. If there 
had been it would have told us nothing about how to change stu- 
dents* conceptions. This is a common problem in naturalistic 
studies. One gets whatever one happens to find in the setting. If 
there is nothing very interesting going on little of use will be 
brought out. Also, naturalistic studies are limited by what 
passes for actual practice, not by what is possible. It is quite 
possible that most of what will occur in education in the next 
century is being tested in laboratory schools, industrial set- 
tings, and nontraditional educational locations. The public 
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schools are likely to be the last places to find out about these 
changes, whether through experimental or naturalistic means. 

Naturalistic research on expert-novice differences in read- 
ing is limited by selection, ie. the choice of locations; bj 
history, the context of the location; by instrumentation, espec- 
ially changes in the observer/interviewer; and by t(i;uporal limi- 
tations in when the study is conducted and for how long. It is 
not argued here that naturalistic inquiry is less appropriate 
than the quasi -experimental research described earlier. Neither 
is likely to be able to draw valid conclusions regarding 
curriculum change in the absense of careful experimental 
manipulation of variables. 

Summary 

This paper has sought to draw attention to methodological 
deficiencies inherent in expert-novice research with respect to 
drawing inferences about curriculum change. Much credit must be 
given to the reading research community for generally not leaping 
to conclusions from such literature, in comparison with some 
fields of psychology, engineering, and, science education. While 
some reading studies seem to overreach their conclusions, far 
more have used the observed differences to probe experimentally 
hypotheses generated by the observations. This approach cannot be 
faulted, even if one cannot resist challenging the original 
premise: that good readers can tell us anything about how poor 
readers ought to proceed. The methodological threats to internal 
validity of such research ventures should make us pause to 
consider if good-poor or e);pert-novice comparisons are really of 
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value: history, selection, instrumentation, maturation, and 
regression. While no study necessarily is damned due to possible 
internal invalidity thr.^ats, the weight of methodological 
argument certainly should make us pause. Ex post facto methods, 
such as meta analysis, can never rectify the poor initial choice 
of field of exploration. If we want to see how poor readers can 
be made into good readers we ought to find examples, or better 
yet, create examples, and then work to find out what is 
repli cable. That is good science and good research. 
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